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MINIMUM-CROSS-ENTROPY  SPECTRAL  ANALYSIS  OF  MULTIPLE  SIGNALS 


I.  INTRODUCTION  AND  BACKGROUND 

We  present  here  an  information-theoretic  method  for  simultaneously 
estimating  a  number  of  power  spectra  when  a  prior  estimate  of  each  is 
available  and  new  information  is  obtained  in  the  form  of  values  of  the 
autocorrelation  function  of  their  sum.  The  method  applies  for  instance  when 
one  obtains  autocorrelation  measurements  for  a  signal  with  independent 
additive  interference,  and  one  has  some  prior  knowledge  concerning  the  signal 
and  the  noise  spectra;  the  result  is  signal-  and  noise-spectrum  estimates  that 
take  both  the  prior  estimates  and  the  autocorrelation  information  into 
account.  One  thus  obtains  a  procedure  for  noise  suppression  that  offers  some 
advantages  over  more  traditional  procedures,  such  as  those  based  on  spectral 
subtraction. 

The  present  method  is  a  generalization  of  minimum-cross-entropy  spectral 

analysis  1.1],  which  is  in  turn  a  generalization  of  maximum-entropy  (or 

linear-predictive  or  autoregressive)  spectral  analysis  [2],  (. 3 ] .  All  these 

methods  proceed  from  autocorrelation  values.  Miniraum-cross-entropy  spectral 

analysis  (MCESA)  differs  from  maximum-entropy  spectral  analysis  (MESA)  in  that 

it  explicitly  uses  a  prior  estimate  of  the  power  spectrum;  it  reduces  to  MESA 

as  a  special  case  when  the  prior  estimate  is  uniform  and  one  of  the  given 

autocorrelation  values  is  for  zero  lag.  The  present  method,  multi-signal 

MCESA,  differs  from  MCESA  in  that  it  treats  an  arbitrary  number  of  independent 
Manuscript  submitted  February  17, 1981. 


spectra  simultaneously;  in  the  special  case  of  a  single  spectrum,  it  becomes 
identical  to  MCESA. 

MESA  may  be  regarded  as  an  application  of  the  principle  of  maximum  entropy 
1.4],  15];  single-  and  multi-signal  MCESA  are  applications  of  a  generalization 
of  that  principle,  the  principle  of  minimum  cross  entropy  (also  called  minimum 
discrimination  information,  directed  divergence,  I-divergence ,  relative 
entropy,  or  Kullback-Leibler  number)  [6],  [7],  18],  L 9] ,  [10],  111].  In  the 
remainder  of  this  section,  we  describe  these  spectrum-analysis  methods  further 
and  include  some  background  on  the  principle  of  minimum  cross  entropy. 

Section  II  contains  a  derivation  of  our  multiple-signal  estimator,  and  section 
III  discusses  a  few  of  its  general  properties.  Section  IV  presents  two 
numerical  examples,  one  of  which  is  based  on  measured  samples  of  speech 
signals  and  noise.  Finally,  section  V  contains  a  concluding  discussion. 


A.  MESA  and  MCESA 


MESA  addresses  the  following  problem:  estimate  the  power  spectrum  S(f)  of 
a  real,  band-limited,  stationary  process,  given  values  of  the  autocorrelation 
function 

r  w 

R(t)  *  2  \  df  S(f)cos  2irft 

J  0 

for  finitely  many  lags  t  =  t^,  r  *  0,...,  M.  (Here  W  is  the  bandwidth.) 

The  solution  proposed  by  Burg  [2],  [3]  is  to  choose  the  estimate  Q  of  S  that 
maximi zes 


J 


W 

df  log  Q(f) 

0 


(1) 


2 


subject  to  the  constraint  that  the  autocorrelation  function  assume  the  given 


values: 


R(tr)  a  2 


I 


W 

df  Q(f)cos  2lTft 
0  r 


The  resulting  estimator  has  the  form 


(2) 


1 

Q(f)  -  _ - ,  (3) 

Zr  2j°rCOS  2*ftr 

where  the  coefficients  (r  *  0,...,M)  are  chosen  so  that  Q  satisfies  (2). 

MCESA  is  applicable  to  the  problem  of  estimating  S(f)  when,  in  addition  to 
the  autocorrelation  values,  a  prior  estimate  P  of  S  is  given;  P  may  be  thought 
of  as  the  best  guess  at  S  we  could  make  in  the  absence  of  autocorrelation 
data.  The  MCESA  estimator  has  the  form  {l] 


1 

Q(f )  *  -  ,  (4) 

1/P(f)  ♦  2prcoa  2wftr 

where  again  the  are  chosen  so  that  Q  satisfies  the  constraints  (2).  We 
call  Q  the  posterior  estimate  of  S  based  on  the  prior  estimate  P  and 
constraints  (2).  This  estimator  can  be  obtained  directly  from  the 
minimum-cross-entropy  principle  1.1];  it  can  also  be  obtained  by  minimizing  the 
Itakura-Saito  distortion  measure  [12] 


subject  to  (2)  L 1 ] .  When  P(f)  is  uniform,  and  one  of  the  autocorrelation 
values  is  at  lag  zero  (say  tQ  ■  0),  we  can  write  (4)  in  the  form  (3),  since 


3 


the  constant  1/P  can  be  absorbed  into  the  coefficient  Thus  in  this  case 

MCESA  reduces  to  MESA. 

For  multi-signal  MCESA,  the  problem  is  to  estimate  the  power  spectra 
S£(f)  of  a  number  of  independent  processes,  given  values  of  the  total 
autocorrelation 

R(t)  =  2  X  f  df  S .  (f  )cos  2rft 

i  J  0  1 

and  a  prior  estimate  for  each  S^.  The  estimator  has  the  form 

1 

Q  (f)  =  - — - - -  ,  (5) 

1/P^f)  +  20  cos  2wftr 

where  the  |3.  are  chosen  so  that  the  constraint  equations 

R(tr)  =  2  \  df  Q r  ( f  )cos  2lTft  (6) 

i  J  0 

are  satisfied.  Note  that  the  summation  term  in  the  denominator  in  (5)  is 
independent  of  i.  In  Section  III  we  derive  the  estimates  (5)  directly  from 
the  principle  of  minimum  cross  entropy.  We  also  show  that  they  can  be 
obtained  by  minimizing  the  sum 


Qi<f) 

P^f) 


-  log 


Qi<f ) 
P^f) 


of  Itakura-Saito  distortions  subject  to  the  constraints  (6).  Equations  (5) 
and  (6)  reduce  to  (4)  and  (2)  when  there  is  only  one  spectrum  S^;  thus 
multi-signal  MCESA  reduces  to  ordinary  MCESA  in  case  there  is  only  one 
signal . 


1 


I 
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B.  Cross-Entropy  Minimization 

The  principle  of  minimum  cross  entropy  is  a  general  method  for  inference 
about  probability  distributions  when  information  is  avaliable  in  the  form  of 
expectation  values  of  known  functions. 

Let  q^  be  a  probability  density  on  a  space  of  states  x  of  some  system. 
Suppose  that  q*  is  not  known,  but  there  is  some  prior  density  p  (on  the  same 
space)  that  is  our  current  estimate  of  q^.  Now  suppose  we  gain  new 
information  about  q^  in  the  form  of  expectation  values 

J"dx  qt(5)gr(j)  -Ir  (7) 

of  known  functions  gf.  In  general,  these  constraints  do  not  determine  q* 
uniquely:  the  equations  (7)  are  satisfied  by  other  densities  q  than  q*  (but 
not  necessarily  by  p).  The  problem  to  be  solved  is,  given  p  and  the 
constraints  (7),  to  make  the  best  possible  choice  of  a  new  (or  posterior) 
estimate  q  of  q*.  The  principle  of  minimum  cross  entropy  states  that  one 
should  choose  that  density  q,  among  all  the  densities  that  satisfy  the 
constraints,  that  has  the  least  cross  entropy 

H(q,p)  -  ^dx  q(x)log(q(x)/p(x))  (8) 

with  respect  to  p. 

Given  a  positive  prior  probability  density  p,  if  there  exists  a  posterior 
q  that  minimizes  the  cross  entropy  and  satisfies  the  constraints  (7),  it  has 
the  form 


q(x) 


p(x)  exp  (-  A  -  £  #.8r<S>) 


(9) 
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with  the  possible  exception  of  a  set  of  states  on  which  the  constraints  imply 
that  q  vanishes  [6,  p.  38],  [10].  In  (9),  A  and  are  Lagrange  multipliers 
whose  values  are  determined  by  the  normalization  constraint 

C  dx  q(x)  =1  (10) 

and  by  the  constraints  (7),  respectively.  Conversely,  if  there  are  values 
for  A  and  the  |3r  for  which  the  constraints  are  satisfied,  then  the  solution 
exists  and  is  given  by  (9)  1 10].  Conditions  for  existence  of  solutions  are 
given  by  Csiszar  L 10 1  • 

One  could  imagine  using  a  procedure  based  on  minimization  of  some  function 
of  q  and  p  other  than  H(q,p).  In  what  sense  does  minimizing  cross  entropy 
yield  the  best  estimate  q  of  q^?  One  answer  to  this  question  is  provided  by 
recent  work  [7]  that  characterizes  cross-entropy  minimization  as  ar  inference 
procedure  by  means  of  certain  consistency  axioms.  In  describing  this  work,  it 
is  usful  to  view  an  inference  procedure  as  an  operator  *  that  takes  two 
arguments,  a  prior  probability  density  p  and  new  constraint  information  I  of 
the  form  (7),  and  yields  a  posterior  probability  density  p*I.  It  is  assumed 
in  1.7]  that  °  is  implemented  by  minimization  of  some  well  behaved  function 
H'(q,p)  —  that  is,  that  q  =  p*I  is  defined  as  that  density,  among  all  the 
densities  that  satisfy  the  constraints  I,  for  which  H'(q,p)  is  least.  It  is 
further  assumed  that  the  operator  #  satisfies  consistency  axioms  that, 
informally,  require  different  ways  of  taking  information  I  into  account  (for 
example,  in  different  coordinate  systems)  to  lead  to  equivalent  results.  It 
is  then  shown  to  follow  from  the  assumptions  that  p#I  equals  the  result  of 
minimizing  the  cross  entropy  H(q,p).  The  axioms  do  not  imply  that  H'  must  be 
H  --  for  instance  a  monotonic  function  of  H  would  do  just  as  well  --  but  they 
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do  uniquely  characterize  the  result  p*I  of  the  minimization:  cross-entropy 
minimization  is  uniquely  correct  in  the  sense  that  minimization  of  any  other 
functional  either  gives  the  same  result  or  leads  to  a  contradiction  with  one 
of  the  axioms. 

Other  justifications  for  the  use  of  cross-entropy  minimization  can  be 
based  on  cross  entropy's  properties  as  an  information  measure  (.6],  110],  [13], 
[14].  For  instance,  H(q,p),  informally  speaking,  measures  the  distortion, 
"information  dissimilarity,"  or  "information  divergence"  of  q  from  p.  H(q,p) 
can  be  interpreted  as  the  amount  of  information  needed  to  change  a  prior  p 
into  the  posterior  q  or  to  determine  q  given  p  l 14];  indeed, 

H(q, P)  *  H(q+,p)  -  H(qf,q)  (11) 

holds  when  q  »  p#I  is  defined  by  cross-entropy  minimization  [10],  [i4]  In 
these  terms  the  minimum-cross-entropy  principle  is  intuitively  justified  as 
the  choice  of  posterior  q  that  introduces  the  least  distortion,  least 
additional  information,  or  fewest  unjustified  assumptions  consistent  with  the 
given  constraints.  From  (11)  it  follows  that  H(q^,q)  ^  H(q*,p).  Thus  the 
posterior  q  is  closer  to  q*  in  the  cross-entropy  sense  than  is  the  prior  p. 

Yet  another  justification  for  cross-entropy  minimization  is  provided  by 
the  "expectation-matching"  property  [14],  which  states  that  for  an  arbitrary 
fixed  density  q*  and  densities  q  of  the  form  (9),  H(q*,q)  is  least  when  the 
expectations  of  q  match  those  of  q*.  In  particular,  it  follows  that  q  *  p*I 
is  not  only  the  density  satisfying  (7)  that  minimizes  H(q,p),  but  also  the 
density  of  the  form  (9)  that  minimizes  H(q*,q).  Hence  p*I  is  not  only 
closer  to  q^  than  is  p,  but  it  is  the  closest  possible  density  of  the  form 
(10).  The  expectation-matching  property  is  a  generalization  of  a  property  of 


7 


If 


orthogonal  polynomials  1.15,  p.12]  that,  in  the  case  of  speech  analysis  1.16], 
is  called  the  "correlation-matching  property"  117,  ch.  2].  For  further 
justifications  see  [7],  114]. 

II.  DERIVATION 


We  assume  the  time-domain  signal  is  a  sum  of  stationary  random  processes 
g^(t),  i  =  1,...,  K.  In  many  applications,  K  will  be  2  —  one  signal 
process  and  one  noise  process  —  but  the  case  of  arbitrary  K  is  no  harder  than 
K  =  2,  so  we  do  the  derivation  in  that  generality.  It  is  convenient  to  work 
with  discrete-spectrum  approximations  to  the  ll],  118], 

N 

si(t)  =  T  <aikcos  21rfkc  +  biksin  ^k0  * 


where  the  a^  a°d  the  are  ran<*om  variables  and  the  f^  are  non-^ero 
frequencies,  not  necessarily  uniformly  spaced.  We  write  x..^  for  the  power 
of  the  process  at  frequency  f^, 


x 


ik 


2 

ik 


+ 


> 


and  will  describe  the  processes  in  terms  of  a  joint  probability  density 
q*(x)  =  q'(x,,...,  x„),  where  x.  stands  for  (x. x.„).  The 

**  I  **1  11  IN 

marginal  density  for  each  x^  is  defined  by 


qt(x.) 


I 


where  each  component 


x., 


of  the  variables  of  integration  ranges  from  0  to  ***. 
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Let  ■  P^(f^)  be  prior  estimates  of  the  power  spectra  of  the 
s^.  Then  we  may  take 

Pi^i5  ■  TT  u/pik>exP(-xik/pik) 

k=l 


(12) 


as  prior  estimates  of  qt(x^).  The  assumed  exponential  form  is 
equivalent  to  a  Gaussian  distribution  in  the  amplitude  variables  a^^  and 
b^k;  for  justification  of  its  use,  see  1.1],  [19].  Note  that  the 
coefficients  are  chosen  so  that  the  expected  value  of  the  power  x^  of  the 
process  s^  at  frequency  f^  is  equal  to  the  prior  estimate: 

P.,  ■  \  dx.p.  (x.  )x.,  . 

lk  \  ~i*i  ~i  ik 

Since  we  assume  independence  of  x.  and  x.  (i£j),  our  prior  estimate  of 
q*  becomes 


K  N 

p(5}  “  TT  TT  u/pik)e*P(-xik/pik)  • 

1=1  k=l 


(13) 


The  spectral  power  of  each  process  s^  at  frequency  f^  is  given  by 


T.. 

ik 


(2)xik  ■ 


(14) 


and  the  autocorrelation  function  of  each  s^  is 


Rir  ”  crkTik  * 

k=l 


(15) 


where  cfk  ■  2  cos  2rtrffc.  Suppose  we  obtain  information  about  qt  in 
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Che  form  of  autocorrelation  values  for  the  sum  of  the  s^, 


«\ 

R  -  5“  R.  , 
r  r—.  lr  ’ 

1*1 

r  *  M,  where  t^  »  0.  In  view  of  (14)  and  (15),  this  has  the  form  of 

linear  constraints  on  expectation  values  of  q^: 


r  K  N 

Rr  ‘  |  d|  <1+(|)  S  Z 

J  i-1  k-1 


c  ,  x..  . 
rk  lk 


(16) 


Applying  the  principle  of  minimum  cross  entropy  to  the  prior  (13)  and 
constraints  (16)  yields  a  posterior  estimate  q  of  qt  given  by 

(M  K  N 

-  A  ■  Z  ft  2  (2  crkxik) 


-X 


R  U  <1/Pik,"l>  (-  Xik/Pil.  •  S 


-A 


K  N 


U  a/pik)exp("Aikxik)  p 


where  A  and  the  are  Lagrange  multipliers  corresponding  to  the  constraints 
(10)  and  (16),  respectively,  and  A^k  *  1/P^  +^r|^rCrk’  Because 
the  normalization  constraint  (10),  this  becomes 


K  N 


H  AiReXI>("AikXik)  ‘ 


(17) 


The 


posterior  estimate  of  the  power  spectrum  of  s^  is 


>ik  ■  j1"*  ,,<S)xik  '  > 
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thus 


cik 


1/Pik  +  Arcrk 


(18) 


where  the  ^  must  be  chosen  so  that  the  constraints 


K  N 

Rr  '  Z  Z  ‘rk«ik 

1*1  k*l 


(19) 


are  satisfied.  Equations  (18)  and  (19)  are  simply  discrete  analogs  of  (5)  and 

(6). 

When  p  and  q  are  given  by  (13)  and 
K  N 

q<s>  ’  R  J[  <i/Qik,'*',<'xik/<|ik) 


(cf.  (17)),  the  cross  entropy  (8)  can  be  calculated  explicitly: 


i")]- 

The  quantity  in  brackets  is  a  discrete  analog  of  the  Itakura-Saito  distortion 
measure  112],  116]  of  with  respect  to  Q^;  cross-entropy  minimisation  is 
thus  equivalent  to  choosing  the  so  as  to  minimise  the  sum  of 
Itakura-Saito  distortions.  We  obtain  an  alternative  derivation  of  (18)  by 
minimizing  the  right  side  of  (20)  directly,  subject  to  the  constraints  (19). 
Namely,  we  form  the  expression 


H(q,p) 


K  f  N  /Q-u 

Z  IP- 

i-1  k-l\P.. 

'  lk 


log 


c  ,Q., 
rk*ik 
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involving  Lagrange  multipliers  jS^,  and  we  set  the  partial  derivative  with 
respect  to  each  equal  to  zero: 

l/pik  -  »«ii.  *  X  ft<=ck  -  o  • 

r=0 

This  yields  (18). 


III.  PROPERTIES 

In  this  section  we  discuss  three  miscellaneous  properties  of  our 
multi-signal  method.  We  call  the  first  "order  preservation";  briefly,  it 
states  that  the  method  preserves  the  relative  magnitudes  of  the  priors.  The 
second,  "preservation  of  independence,"  is  related  to  the  assumption  of 
statistical  independence  of  the  processes  s^;  it  follows  from  a 
generalization  of  the  property  of  cross-entropy  minimization  that  was  called 
"system  independence"  in  (7]  and  114].  The  third  is  related  to  a  phenomenon 
that  we  call  "prior  washout"  and  that  occurs  when  a  posterior  resulting  from 
one  analysis  is  used  as  a  prior  for  a  subsequent  analysis;  we  compare  and 
contrast  the  behavior  of  the  single-  and  multi-signal  methods  in  this 
situation. 

A.  Order  Preservation 

Let  P^  and  Pj  be  two  prior  spectra  and  and  Qj  be  corresponding 
posterior  spectra  resulting  from  a  multi-signal  MCESA  analysis.  The 
order-preservation  property  is  the  observation  that  for  each  frequency  f^  we 
have  Qi  <  Qj,  Qj  -  Qj,  or  Q.  >  if  and  only  if  ?i  <  P^, 

P^  *  Pj,  or  P^  >  Pj,  respectively.  This  follows  from  the  form  of  the 
representation  of  the  in  (5).  The  property  accords  well  with  intuition: 
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if  we  expect  a  priori  that  s.  has  greater  power  than  s.  at  frequency  f.  , 

.  1  J  k 

that  expectation  should  not  be  altered  by  new  information  that  concerns  only 
the  sum  of  the  two  powers . 

B.  Preservation  of  Independence 

In  (13),  we  wrote  the  prior  probability  density  p  in  the  form 

P<*>  *  TT 

*  i-1 

(cf.  (12))  to  reflect  the  initial  assumption  that  the  x^  are  independent. 
Preservation  of  independence  is  the  property  that  the  posterior  density  q  has 
the  same  form, 

K 

“  JT 

i-i 

(cf.  (17)),  so  that  the  x^  remain  independent  after  the  prior  density  is 
replaced  by  the  posterior.  This  posterior  independence  would  be  a  simple 
consequence  of  the  system  independence  property  of  L 7]  and  [14]  if  the 
constraints  (16)  were  of  the  form 

Rr  “  Jd5  «t(S)*r(5i(r)) 

—  that  is,  if  each  constraint  involved  only  one  of  the  sets  x^  of  variables 
(where  which  set  was  involved  might  depend  on  the  constraint).  System 
independence  was  one  of  the  consistency  axioms  in  l 7] 5  informally,  it  states 
that  it  doesn't  matter  whether  independent  constraint  information  about 
separate  systems  with  independent  priors  is  accounted  for  separately,  for  each 
system,  or  jointly,  by  treating  the  system  as  one  composite  system.  In  the 
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present  case,  the  constraints  have  the  more  general  form 


R 


r 


dx  qt(x)  Z  &Ti^ 
1*1 


—  each  constraint  involves  a  linear  combination  of  functions,  each  involving 
one  of  the  x^.  Nevertheless,  posterior  independence  still  follows  from 
prior  independence  in  this  more  general  case. 


C.  Prior  Washout 


The  phenomenon  we  are  here  calling  "prior  washout"  was  mentioned  in  [14] 

in  connection  with  "Property  14."  Property  14,  in  slightly  specialized  form, 

states  the  following.  Let  p  be  a  prior  probability  density.  Let  1^^  and 
(2) 

lv  be  sets  of  constraints  of  the  form  (7),  but  with  the  right  side 

replaced  by  g^^  for  1^^  and  by  g£^  for  that  is,  1^^ 

(2) 

and  I  both  constrain  the  expectations  of  the  same  set  of  functions  g^, 
but  the  expected  values  may  differ.  Then,  in  terms  of  the  o  operator, 


(P  •  I(U) 


•  I 


(2) 


•  I 


(2) 


the  effects  of  taking  the  information  into  account  are  completely 

(2)  . 

washed  out  when  I  is  taken  into  account. 

One  consequence  of  prior  washout  is  a  similar  property  of  single-signal 

MCESA.  For  definiteness,  consider  a  speech-processing  system;  say  we  wish  to 

estimate  the  speech  spectra  ...  in  a  succession  of  analysis 

frames,  and  we  can  measure  the  speech  autocorrelations  R^*\ 

(2) 

,  ...  in  these  frames  at  a  fixed  set  of  lags  r.  Starting  with  a 
prior  spectral  estimate  P,  suppose  we  form  a  posterior  estimate  for  a 
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inCo  account . 


frame  i  by  taking  the  autocorrelation  information 

/  *  v 

Suppose  we  then  use  this  posterior  Q'  as  a  prior  estimate  for  a  later 
frame  j  and  obtain  a  posterior  estimate  for  that  frame  by  taking  into 

account.  Prior  washout  implies  that  the  result  is  the  same  that  we 

would  have  gotten  if  we  had  used  P  instead  of  as  the  prior  estimate  for 

frame  j;  taking  R^^  into  account  completely  washes  out  the  effects  of 
having  taken  R^*-^  into  account. 

This  property  has  implications  for  certain  noise-suppression  schemes  in 
which  one  might  envision  using  MCESA.  Suppose  that  additive  noise  is  present 
in  a  speech-analysis  system.  It  is  often  possible  to  detect  whether  or  not 
speech  is  present  in  an  analysis  frame.  If  frame  i  is  such  a  frame,  then 
is  an  estimate  of  the  noise  spectrum.  Since  the  noise  spectrum 
contains  information  about  part  of  what  is  likely  to  be  present  in  a  later 
frame  j  that  contains  speech  plus  noise,  it  follows  that  using  as  a 

prior  for  frame  j  might  result  in  more  accurate  estimation  of  the  total 
spectrum  in  that  frame,  thus  allowing  more  accurate  compensation  for  the 
noise,  say  by  subtraction  of  the  noise  spectrum.  (On  the  other  hand  we  might 
worry  that  this  procedure  would  unduly  enhance  the  noise  component  of  the 
later  estimate,  thus  further  degrading  the  speech.)  However,  if  the  analyses 
of  frames  i  and  j  are  based  on  the  same  set  of  autocorrelation  lags,  prior 

washout  occurs,  and  the  use  of  as  a  prior  for  frame  j  has  no  effect 

whatever  on  the  result  of  the  analysis  in  frame  j. 

Although  the  same  property  holds  for  multi-signal  MCESA,  a  combination  of 

single-signal  and  multi-signal  MCESA  can  be  used  to  avoid  prior  washout  and 
exploit  the  results  of  analyzing  frames  containing  noise  only.  In  particular, 
during  a  frame  wher.  speech  is  absent,  obtain  an  estimated  noise  spectrum  by  a 
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single-signal  analysis.  Use  this  spec trim  as  a  prior  noise  estimate,  together 
with  some  other  appropriate  spectrum  as  a  prior  speech  estimate,  for  a 
multi-signal  analysis  in  later  frames.  A  procedure  of  this  sort  is 
illustrated  in  section  IV. 

The  reason  that  prior  washout  does  not  occur  in  this  case  is  that  the 
initial  computation  of  the  estimated  noise  spectrum  uses  constraints  on  noise 
autocorrelations  values,  while  the  subsequent  computations  use  constraints  on 
total  autocorrelations;  thus  different  sets  of  functions  are  being 


constrained.  In  fact,  let  be  the  prior  used  in  obtaining  the  initial 

estimated  noise  spectrum  ^  by  single-signal  MCESA.  Then  Q5.1  ^  has 

N  N 

components  at  frequency  f^  of  the  form 


,(U 

'Nk 


i/PNk  * 


Zr/V 


rk 


If  is  used  as  a  noise  prior  in  later  computations,  and  a  spectrum 

Ps  is  used  as  a  speech  prior,  the  resulting  noise  and  speech  posteriors 

(2)  (2) 

and  Qg  have  the  form 


(2) 

'Nk 


1/PNk  +  ^r  A rCrk  +  ^r  £rCrk 


(21) 


(2) 


1/PSk  +  ^r  £rCrk 
If  Pjj  were  used  in  place  of  in  the  later  computations,  the 


(22) 
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resulting  posteriors  would  have  the  form 


(2) 

'Nk 


1/PNk  +  ^r  P*crk 


(23) 


(2) 

<Sk 


1/PSk  +  ^-r £rcrk 


(24) 


Now,  for  linearly  independent  constraints,  (21)  and  (23)  are  compatible  only 

if y3*r  * j8r  +/®*r  holds,  and  (22)  and  (24)  are  compatible  only  if 

holds.  Thus  the  analog  of  prior  washout  will  not  in  general 

occur  here  unless  A  *  0  holds  —  that  is,  unless  ■  P„. 

IT  N  N 


IV.  EXAMPLES 


In  this  section  tie  present  two  numerical  examples;  in  each,  a  given  set  of 
data  is  analyzed  both  by  multi-signal  MCESA  and  by  either  single-signal  MCESA 
or  a  conventional  MESA  method.  In  the  first  example,  autocorrelations  at  a 
small  number  of  equally  spaced  lags  are  computed  from  the  sum  of  a  pair  of 
assumed  "true"  spectra,  and  single-  and  multi-signal  MCESA  estimates  are 
obtained  from  them.  In  the  second,  autocorrelations  are  estimated  from  sums 
of  speech-signal  and  noise  samples,  and  spectral  estimates  are  obtained  by 
MESA  and  multi-signal  MCESA. 

The  assumed  original  spectra  for  the  first  example  are  a  pair  S_  and 

D 

S  ,  which  we  think  of  as  a  known  "background"  component  and  an  unknown 
"signal"  component  of  the  total  spectrum.  For  numerical  purposes  we  use  the 
spectral  powers  Sg^  and  Sg^  at  a  hundred  equally  spaced  frequencies  f^  • 

+.005,  +.015,  ...  ,  +.495  between  -.5  and  +.5  (the  Nyquist  band:  we  take  the 
spacing  between  autocorrelation  lags  to  be  unity).  The  background  consists  of 


an  approximation  to  white  noise  plus  a  peak  corresponding  to  a  sinusoid  at 
frequency  .215: 


(  1.05,  f.  *  +.215 

S  =  )  k  - 

(  .05,  otherwise  . 

The  signal  term  consists  of  a  nearby,  similar  peak  at  frequency  .165: 

Cl*  f.  -  +.165 

S  =  )  k  ~ 

Sk  (  0,  otherwise  . 

Thus  the  total  assumed  spectrum  S_  ♦  S-  is  as  shown  (for  positive 

o  S 

frequencies)  in  figure  1.  Here  are  the  corresponding  autocorrelations  R^  at 
six  lags  t  =  0,  1,  ...  ,  5: 

t  0  1  2  3  4  5 

r 

Rr  9.0000  1.4544  -2.7732  -3.2248  0.2032  2.6900 


For  the  multi-signal  calculation,  we  use  a  pair  of  prior  spectral 

estimates  P  and  P  .  Since  we  are  assuming  prior  knowledge  of  the 
ft  b 

background  spectral  component  Sn,  we  simply  take  P_  *  S_  as  shown  in 

ft  D  D 

figure  2.  To  reflect  prior  ignorance  of  the  signal  component  Sg,  we  take 

P  to  be  uniform  as  in  figure  3;  for  this  example  we  have  somewhat 

& 

arbitrarily  normalized  Pg  to  have  the  same  total  power  as  Pg.  For  the 
single-signal  calculation,  we  use  P  *  Pg  +  Pg  as  the  prior  spectral 
estimate. 

Figure  4  shows  the  result  of  the  single-signal  analysis  --  the  MCESA 
posterior  estimate  Q  obtained  from  the  prior  estimate  P  and  autocorrelations 
Rf.  Corresponding  to  the  "known"  peak  at  frequency  .215  (which  was  included 
in  the  prior)  there  is  a  sharp  peak  in  the  posterior  at  that  frequency; 
corresponding  to  the  "unknown"  peak  at  frequency  .165  there  is  a  maximum  at 
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approximately  that  frequency  that  is  broader  than  the  first,  but  resolvable 
from  it. 

The  same  original  spectra  S_  and  S_  and  the  same  autocorrelations  R 

us  r 

were  used  in  an  example  in  [l].  There  a  MESA  and  an  MCESA  spectral  estimate 
were  compared  (see  figures  5  and  6  in  (l]>.  The  MESA  estimate  failed  to 
resolve  the  two  peaks  and  showed  a  single  maximum  at  about  the  mid-frequency 
The  MCESA  estimate  was  based  on  P_  instead  of  P  ♦  P  as  a  prior;  the 
result  differed  from  figure  4,  but  was  qualitatively  similar.  In  both  cases 
the  MCESA  estimate  implies  the  presence  of  the  signal  at  frequency  .165,  but 
does  not  provide  a  numerical  estimate  of  the  signal.  Such  an  estimate  is 
provided  by  multi-signal  MCESA. 

The  two  individual  posterior  estimates  Q_  and  Q  from  the  multi-signal 

analysis  are  shown  in  figures  5  and  6.  The  sharp  peak  at  frequency  .215  is 

seen  to  be  correctly  assigned  entirely  to  the  background  posterior  Q  — 

unsurprisingly,  since  it  was  present  in  the  background  prior  but  not  the 

signal  prior.  The  broader  maximum  corresponding  to  the  original  peak  at 

frequency  .165  is  present  in  the  signal  posterior  Qg  and  is  also  present, 

though  less  prominent,  in  Q  .  To  understand  why,  qualitatively,  consider 

that  the  autocorrelations  depend  only  on  the  total  spectrum;  the 

autocorrelation  constraints  can  equally  well  be  satisfied  by  allocating 

spectral  power  near  frequency  .165  to  QB,  or  to  Qg.  By  the  discussion  in 

section  III,  the  relative  magnitudes  of  the  posteriors  at  each  frequency 

depend  on  the  relative  magnitudes  of  the  priors.  Both  P  and  P  are  flat 

D  S 

near  frequency  .165,  and  because  of  the  normalization  chosen,  Pe  is  somewhat 

s 

greater  there.  Consequently,  the  broad  maximum  in  Q  is  somewhat  greater 
than  that  in  Qg. 
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Multi-signal  MCESA  posterior  estimate  of  background  spectrum 
(first  example) 


FREQUENCIES 

Multi-signal  MCESA  posterior  estimate  of  signal  spectrum 
(first  example) 


The  second  example  is  based  on  time-domain  samples  of  voiced  speech  and 
noise.  The  speech  comprises  a  portion  of  an  English  sentence  spoken  by  a  male 
speaker  and  includes  the  first  word,  "Sue,"  of  the  sentence  together  with 
silent  segments  before  and  after  it.  The  noise  consists  of  a  segment  of 
helicopter  noise  equal  in  duration  to  the  speech.  These  were  separately 
filtered,  sampled,  and  digitized  at  8000  samples  per  second.  The  speech  and 
noise  data  were  then  added  sample  by  sample,  resulting  in  samples  of  noisy 
speech.  These  samples  were  segmented  into  analysis  frames  of  180  samples,  and 
11  autocorrelations  R^,  r  *  0,  1,  ...,  10  were  estimated  for  each  frame  by 
the  formula 

j  180-r 

Rr  “  180  2  SjSj+r  ’ 

where  Sj  is  the  sample  in  the  frame.  This  is  a  biased  estimate  but 
guarantees  positive-definiteness.  No  additional  windowing  or  filtering  was 
used . 

The  last  frame  before  the  actual  beginning  of  the  word  was  selected;  this 
frame  of  "noisy  speech"  thus  consisted  entirely  of  noise.  From  the 
autocorrelation  estimates  for  this  frame,  a  conventional  MESA  (i.e. 
uniform-prior  MCESA)  spectral  estimate  was  computed  for  use  as  a  prior 
estimate  of  the  noise  spectrum  in  subsequent  frames.  A  uniform  spectrum  was 
used  as  a  prior  estimate  for  the  speech  spectrum  in  the  subsequent  frames. 
These  two  priors  are  shown  in  figure  7.  Much  of  the  noise  power  is 
concentrated  in  a  peak  near  2780  Hz. 

From  the  two  priors  and  the  autocorrelation  estimates,  multi-signal  MCESA 
estimates  of  the  speech  and  noise  spectra  were  computed  for  later  frames. 

From  the  autocorrelation  estimates,  MESA  (LPC)  spectral  estimates  were 
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computed  for  the  noisy  speech.  We  present  the  results  for  a  selected  frame  of 
voiced  speech — the  second  of  seven  that  span  the  vowel  /u/.  For  comparison 
with  these  results,  we  present  in  figure  8  a  MESA  estimate  of  the  uncorrupted 
speech.  This  was  computed  exactly  like  the  MESA  estimate  for  the  noisy  speech 
except  that  the  were  estimated  from  the  speech  samples  only,  not  from  the 
sums  of  speech  and  noise  samples. 

The  MESA  estimate  for  the  noisy  speech  is  shown  in  figure  9.  This 
spectrum  agrees  rather  well  with  the  noise-free  estimate  in  the  band  from  0  up 
to  about  2000  Hz,  which  includes  the  first  two  formants.  Above  2000  Hz, 
however,  there  is  only  a  single  maximum;  the  third  and  fourth  formants  have 
merged  with  the  peak  in  the  noise  spectrum  to  form  a  single  peak  at  about  2690 
Hz. 

We  subtracted  the  noise  prior  (figure  7)  from  this  result  (figure  9).  The 
difference,  shown  in  figure  10,  represents  an  attempt  to  estimate  the  speech 
spectrum  by  a  MESA  analysis  and  spectral  subtraction.  The  subtracted  MESA 
spectrum  is  fairly  close  to  the  unsubtracted  MESA  spectrum  except  in  the 
neighborhood  of  the  noise  peak  at  2780  Hz.  Near  that  frequency,  the 
subtraction  so  far  overcompensates  that  the  difference  actually  assumes  rather 
large  negative  values.  (Absolute  values  are  plotted  in  the  figure.) 

The  multi-signal  MCESA  posteriors  are  shown  in  figures  11  and  12;  figure 
11  is  the  speech,  and  figure  12  is  the  noise.  Figure  12  shows  a  maximum  near 
2440  Hz,  about  130  Hz  higher  than  the  third  formant,  and  a  suggestion  of  the 
fourth  formant  is  discernible.  Except  for  frequencies  near  the  noise  peak, 
the  multi-signal  speech  spectrum  (figure  11)  and  the  subtracted  MESA  result 
(figure  10)  are  quite  close,  the  multi-signal  result  being  usually  the  closer 
of  the  two  to  the  estimate  based  on  noise-free  data  (figure  8).  Near  2780  Hz, 


25 


Fig.  10  —  Result  of  subtracting  noise  prior  (see  figure  7)  fro® 
spectrum  in  figure  9  (second  example) 
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Fig.  ll  —  Multi-signal  MCESA  posterior  estimate  of  speech  spectrum 
(second  example) 


Fig.  12  —  Multi-signal  MCESA  posterior  estimate  of  noise  spectrum 
(second  example) 
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the  multi-signal  result  is  substantially  closer,  and  where  the  subtracted  MESA 
becomes  negative,  the  multi-signal  estimate  takes  only  physically  meaningful 
positive  values.  Both  methods  underestimate  the  total  power  near  2780  Ha  (cf. 
figure  12);  however,  the  multi-signal  method  apportions  the  total  between 
speech  and  noise  in  a  somewhat  reasonable  way,  whereas  the  other  does  not. 

V.  DISCUSSION  AND  CONCLUSIONS 

Multi-signal  MCESA  is  a  new  spectrum-estimation  method  based  on  a  provably 
optimal  informat ion-theoretic  inductive-inference  procedure.  When  separate 
prior  estimates  are  available  for  the  power  spectra  of  two  or  more  processes, 
and  new  information  is  obtained  in  the  form  of  values  of  the  autocorrelation 
function  of  their  sum,  the  method  yields  separate  posterior  estimates.  One 
suggested  application  is  separating  the  spectrum  of  a  signal  from  that  of 
additive  noise.  By  incorporating  prior  estimates  for  both  signal  and  noise 
spectra,  the  multi-signal  method  offers  considerable  scope  and  flexibility  for 
tailoring  an  estimator  to  the  characteristics  of  a  signal  or  noise. 

In  the  second  example  in  section  IV  we  contrasted  this  method  with  a  more 
ad-hoc  method  for  taking  a  prior  noise  estimate  into  account  —  estimate  the 
sum  of  signal  and  noise  spectra  from  autocorrelations  and  then  subtract  the 
prior  noise  estimate.  The  latter  method  seems  to  imply  an  unwarranted 
absolute  commitment  to  the  noise-spectrum  estimate:  adjustments  to  the 
signal-spectrum  estimate  are  made  solely  responsible  for  fitting  the 
autocorrelation  of  the  sum  to  measured  values.  The  multi-signal  method,  by 
contrast,  adjusts  both  noise  and  signal  estimates  in  fitting  the 
autocorrelation  of  the  sum.  We  saw  that  the  multi-signal  method  could  thereby 
avoid  nonphysical  (negative)  estimates  that  can  result  from  spectral 
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subtraction.  Of  course,  whether  or  not  the  multi-signal  method  can  improve 
speech  quality  must  be  determined  by  systematic  experiments  involving  speech 
synthesis  and  intelligibility  testing.  Ue  hope  to  do  such  experiments  in  the 
future. 

In  the  same  example,  a  prominent  noise  peak  was  present  in  the  sum 
spectrum.  Most  of  the  power  in  it  was  properly  attributed  to  the  noise 
spectrum  in  the  posteriors,  but  substantial  leakage  a  few  db  lower  into  the 
signal  (speech)  spectrum  occurred.  The  relative  apportionment  of  the  power  in 
that  peak  between  the  signal  and  noise  posteriors  would  be  substantially 
altered  by  a  change  in  the  level  of  the  uniform  spectrum  that  was  used  as  the 
speech  prior.  This  is  in  contrast  to  single-signal  MCESA,  where  all  uniform 
priors  give  equivalent  results  (as  long  as  one  of  the  constrained 
autocorrelations  is  the  total  power).  How  best  to  choose  the  level  of  this 
uniform  prior  relative  to  the  noise  prior  is  a  question  not  yet  answered. 
Indeed,  since  the  signal  is  known  to  be  speech,  it  would  undoubtedly  be 
beneficial  to  replace  the  uniform  signal  prior  with  one  tailored  to  the 
characteristics  of  speech.  How  best  to  do  this  tailoring  is  another 
unanswered  question.  In  short,  there  is  much  to  be  learned  about  how  to 
choose  the  prior  estimates  to  reflect  our  prior  knowledge  of  signals  and  noise 
in  practical  situations. 
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